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TECHNICAL  REPORT  NO.  7 
NETWOFcKS  OF  ASSOCIATION 

The  search  for  associaiiuns  is  esscntiallj'  a search 
for  ideas,  although  it  will  irequir-itly  be  a prelirnic.ary  .step 
in  the  search  for  specific  documents.  Hence,  the  ass:.cia- 
tion  "machine”  must  neither  be  required  to  perform  as  an 
index  nor,  more  important,  may  it  fail  to  show  a relation 
between  concepts  simply  because  there  is  no  single  ciucu- 
XTient  exhibiting  such  a relation.  Without  a document  about 
ABC,  the  relation  between  -''i,  B.  and  C may  have  to  lie  de- 
duced from  three  documents  dealing  v/ith  AB,  BC,  and  .AC. 
The  association  ayslcm  should,  nevertheless,  show  that 
A,  B,  and  C are  all  related;  it  remains  for  the  index  to 
tell  for  which  combinations  of  these  terms  there  ari;  docu- 
ments in  the  systerti. 

Since  the  existence  of  docume.nts  indexed  by  partic- 
ular combinations  is  discoverable  through  the  index  and 
not  the  association  machine,  we  believe  that  the  mecfiaii- 
isra  of  association  should  presently  be  limited  to  nclwo- as 
of  two  term  associations.  As  pointed  out  in  TecluiHal 
Report  No.  b'y  the  listing  of  three- term  rombinations 
would  require  a device  many  times  more  complicated 
than  that  required  to  give  two-term  association.  Wheti.er 


(l)"The  Preparation  of  Manual  Dictionaries  of  Association”, 
Technical  Report  No.  t>.  Documentation  Incorporated, 
April  1954,  p.  5. 


or  not  :iny  of  the  addition:i!  effort  is  justifiable  depends  on 
whether  the  searcher  would  be  aided  by  having  the  assoc- 
ialio::  marhinc  report  combinations  of  terms  which  have 
all  been  used  together  to  index  one  or  more  reports,  ra- 
ther than  combinations  of  terms  which  are  closely  related. 
This  remains  to  be  seen  in  practice.  It  might  actually  be 
a disadvantage  to  have  an  association  system  w’hich  would 
present  A,  B,  and  C only  if  a document  indexed  ABC 
exists. 

The  important  function  of  the  association  machine  is 
to  present  to  the  searcher  for  his  selection  those  terms 
which  are  sufficientiy  closely  reiated  to  each  ulher  to  form 
a reasonable  basis  for  a study  or  literature  search  in  any 
desired  combination.  That  this  function  can  be  performed 
without  going  beyond  two-term  associations  can  easily  be 
demonstrated. 

A searcher  using  the  punched  card  system  described 
in  the  previous  report  first  selects  the  card  representing 
a key  term,  A,  in  which  he  is  interested.  The  positions 
punched  on  this  card  indicate  the  terms  which  are  assoc- 
iated with  it.  He  then  selects  any  of  these,  say  B,  and 
superimposes  the  card  for  B with  that  of  A.  Since  each 
term  is  coded  in  the  same  position  on  every  card,  the 
holes  which  now  show  through  both  cards  indicate  the  terms 
which  are  associated  with  both  A and  B.  This  process 
of  seiecuuii  and  superimposition  can  be  continued  as  long 
as  desired.  The  tei  ms  selected  by  this  process  form  a 
network,  any  two  of  which  are  associated  in  some  docu- 
ment. This  can  be  represented  giaphically  as  follow'S,. 
each  chord  indicating  an  associuliou: 
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Such  a network  of  tv/o-term  associations  should  cer- 
tainly fulfill  the  conditions  of  the  search.  Clearly,  these 
terms  are  all  closely  related  (whether  or  not  one  docu- 
ment contains  :>U  of  them).  The  mechanical  dispiav  of 
such  networks  of  association  effectively  solves  the  cliall- 
enge  of  Vannevar  Bush  and  provides  the  "coincidences" 
of  ideas  which  Bernier  called  the  most  important  char- 
acteristic of  an  information  system. 


Sinoe  we  are  ht  > e proposing  (.hat  mechanical  aGSOcia- 
tinn  of  ideas  is  to  be  achieved  by  the  superimposition  ol  de- 
dicated positions  in  a set  of  cards  or  plates,  a mechanical 
dictionary  of  associations  can  be  either  a Batten  system  in 
which  each  term  of  the  dictionary  is  a card  or  plate,  or  a 
system  of  language  elements.  Whether  term  c-ards  or  lang- 
uage element  cards  are  used,  the  body  of  the  cards  will 
contain  the  same  pattern  of  dedicated  positions  for  all  the 
terms  in  the  system.  The  actual  punching  or  use  of  a ded- 
icated position  will,  of  course,  indicate  an  actnai  associa- 
tion in  the  system  of  the  term  designating  the  card  (or  term 
rnauc  up  irorn  a set  of  language  elements  cards)  wilti  tne 
term  punched  on  the  card  (or  set  of  cards). 

In  the  indexing  machine,  a hole  common  to  two  card.s 
indicated  a document  a.s  a member  of  the  class  which  is 
the  logical  product  of  the  classes  designated  by  the  two 
cards.  A hole  on  the  air  card  at  position  and  a hole 
on  the  ducts  card  at  position  475  indicates  that  item  475 
concerns  air  ducts.  In  ir.c  association  matiiine  a hole  on 
the  air  card  at  position  475  and  a hole  on  the  ducts  card 
at  position  475  indicates  that  the  term  in  the  system  which 
is  numbered  475,  say  icing,  is  associated  with  air  and 
with  ducts,  (A,  D)tcl.  Note  that  in  accordance  with  the  an- 
alvsis  of  the  logic  of  association  in  Technical  Report  Num- 
ber  4,'  ■'the  association  AJ.  and  A*D.  does  not  tell  us 


An  Extension  of  t.he  Algebra  of  Classes  for  the  Assoc- 
iation of  Ideas,  " Technical  Report  No.  4,  Documen- 


tation  Incorporated,  April,  1954. 


whether  or  not  there  is  in  the  system  the  association 
A*I*D.  Thai  is,  we  are  not  luld  by  the  association  mach- 
ine whether  any  particuiar  item  is  a member  of  the  log- 
ical product  AID.  (air-  icing  * ducts). 

Although  we  are  confident  that  the  search  of  any  sys  - 
tem of  information  for  networks  of  association  should  be 
some  sort  of  a machine  process,  we  must  not  lose  sight 
of  the  fact  that  a manual  dictionary  with  each  term  in  the 
system  denoted  on  a page  (Cf  Exhibits  1-2,  Technical  Re- 
port No.  5)  will  give  us  all  associations  of  the  Terms  in 
the  system  with  any  given  term,  just  as  a siiigle  Uniterm 
card  v/ill  give  ns  all  ^he  items  which  are  members  of  the 
class  denoted  by  the  Uniterm  it  is  only  whe.n  >vc  wish  to 
coordinate  material  on  one  page  with  nuteria!  on  another 
i^hat  the  problem  of  mechanization  becomes  germane. 

All  the  problems  which  were  treated  in  tlie  discussion 
of  the  indexing  machine  must  also  be  considered  witli  re- 
ference to  the  association  machine,  namely  the  nuuTvr 
of  cards  or  sheets;  the  number  of  dedicated  position.s, 
the  percentage  of  use  of  dedicated  positions,  and.  the 
probability  of  false  drops;  and  we  will  discuss  each  of 
these  in  turn.  There  is,  however,  an  addv.iunal  problem 
in  the  association  machine  which  does  not  affect  the  in- 
dexing machine.  The  association  machine  must  display 
ihe  associated  terms  at  every  stage  of  the  machines  oper- 
ation. The  pattern  of  lights  displayed  by  ttie  indexing  much 
ine  at  the  conclusion  of  a search  represent  nurooers.  These 
numbers  can  be  reproduced  autcrnaticaliy  on  a t-nne  as  the 
scanning  frame  on  the  indexing  macliine  passes  over  the 
light  dots  on  the  screen. 


In  the  operation  of  the  association  ipachine  the  selec- 
tion of  terms  at  successive  steps  ir.  the  associative  pro- 
cess is  made  from  the  set  of  terms  displayed  at  the  prior 
step.  This  selection  determines  the  associations  displayed 
by  the  subsequent  operations  of  the  machine  and  is  esser- 
tially  a "feedback"  device.  But  the  very  nature  and  pur- 
pose of  the  machine  and  the  nature  of  the  association  pro- 
cess indicates  that  Inis  feedback  cannot  be  made  io  opei  - 
ate  automatically. 

Nothing  is  freer  than  the  mind  in  making  associations- 
anything  can  be  associated  by  the  minCi  with  anv^thiii^.  Jinc^ 


notorious  fallibility  of  memory  can  be  expressed  as  a ten- 
dency of  the  mind  to  forget  associations  previously  made 
or  experienced,  or  as  a tendency  to  refer  a newly  createu 
association  to  a past  time.  Th;it  is,  the  mind  forgets  ob- 
served associations  and  posits  associations  which  never 


occurred. 

The  dictionary  of  associations  corrects  tlie  fa”il.llity 
of  memory  by  presenting  all  associations  in  a system  and 
only  the  associations  in  a system,  But  for  any  particular 
question  nr  search  put  to  a system  of  associations,  cer- 
tain of  the  associations  may  be  irrelevant.  This  irrel- 
erance  is  not  a matter  of  logic  but  of  puipose.  So  far  as 
the  dictionary  is  concerned  one  association  is  as  gx)d  as 
another;  but  for  a particular  purpose  motivating  a par- 
ticular search  the  purpose  must  guide  the  selection  of 
terms  constitiiting  the  rietwork.  Hence,  as  we  have  noted 
above,  in  order  to  make  possible  this  purposive  feedback, 
the  associations,  at  every  step  of  the  machines  operaticsi, 
must  be  displayed  to  the  searcher  or  operator. 


The  different  methods  of  display  form  3 group  of  prob- 
lems which  will  be  handled  in  a separate  paper  since,  in 
the  balance  of  lias  paper,  we  will  be  concerned  only  with 
those  problems  of  the  association  macldne  which  am 
analagous  to  the  problems  of  the  indexmg  machiiie. 

in  any  system  the  actual  associations  will,  of  course, 
depend  on  the  subject  matter  of  eachdocument.  We  can, 
however,  make  some  statistical  calculauuns  based  on 
reasonable  assumptions  and  as  in  previous  reports,  for 
this  purpose  we  will  assume  a collection  of  50,000  items,  a 
uiciiuiiciiy  oi  5,000  terms,  and  the  of  10  terms  to 
analyze  cr  index  each  document. 

It  any  word  in  oar  dictionary  is  used  only  once  to  in- 
dex one  document,  it  will  be  associated  with  ordy  nine  other 
terms.  The  card  for  such  a term  would  thus  have  punches 
for  at  least  nine  other  ♦erms.  If,  however,  we  assume  a 
uniform  use  of  the  terms  in  the  dictionary,  each  term  will 
be  used  in  the  analysis  or  indexing  of  100  ducuments. 

50,  000  documents  x 10  terms  pur  document  > iqq 
5,  000  terms 

In  this  uniform  system,  the  maximum  number  of  punches 
on  a term  card  will  be  nine  times  the  nuinber  of  documents 
indexed  by  that  term,  or  900. 

V»c  sliail  assume  that  the  indexing  of  a document  is 
independent  of  any  other  documents,  and  that  all  subjects 
are  equally  likely.  Then,  the  probability  tiiat  term  A is 
used  to  index  a certain  document  is  1/500  (since  A is 
used  for  100  documents  out  of  50,000).  If  term  B is  in- 
dependently chosen  from  the  ra.st  of  the  4,  DCB  terms,  each 
equally  probable,  then  ihe  probability  liiat  both  A and  B 
are  entered  for  one  aocumeni  is  i/5G0  x 9/4BBB,  and  tlie 
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probability  that  they  were  not  both  used  for  that  docu- 
ment, 1 - 1/500  X 9/4999  r 1 S 

2,490,500. 

The  probability  that  they  a.re  a5S0e.ia.ie<l  is  the  probabil- 
ity that  they  have  been  used  at  least  once,  or  one  minus 
the  probability  that  they  were  not  used  for  any  of  the 


50,000  documents: 

Prob  (A*B)  T 1-0  - 


50,000 


0. 1648 


We  should,  therefore,  expect  each  term  to  be  associa- 

l X^A  KJX»  y KJKJKJ  <J  X tj  Lt~I  tClliiO* 

It  !s  undoubtedly  not  true,  however,  that  if  A has 
been  used  to  index  a particular  document,  the  remaining 
4,  999  terms  are  equally  probable  choices  for  the  other 
nine  index  entries.  When  one  aspect  of  a subject  is  known, 
there  are  certain  terms  which  are  more  likelv  to  be  needed 


to  complete  the  description.  Let  us  assume  that  a set  of 
999  terms  (to  be  called  are  about  10  tiines  as  likely  to 
be  used  with  A.  as  the  other  4,000.  Then  the  probability 
that  A is  .icsociated  witn  B is  0.  474  if  B is  one  of  the  999 


in  and  0.  0622  if  B is  one  of  the  4,  000.  The  expected 
number  of  paired  a.ssociations  per  term,  or  punches  per 
card  is  722  for  these  conditions. 


If  we  do  not  know  in  advance  whethe.r  B is  among  the 
999  or  the  4,  000,  the  probability  of  its  being  associated 
■ ith  A is  simply  722/4999,  or  0.144.  The  probability  of 
an  arbitrary  Uiird  term  C being  associated  with  both  A 
and  B is  (0. 144)^,  or  0,  0207,  When  the  cards  for  A and  B 
are  superimposed,  therefore,  the  most  likely  number  of 
common  nolos  is  104.  Sinrllany,  uicre  wiir  probably  be 
just  15  terms  associated  with  each  of  three  otliers  chosen 
at  random,  and  only  two  with  each  of  four  terms. 


It  is  rather  academic,  however,  to  consider  terms 
chosen  at  random.  As  described  above,  the  second  term 
examined  will  ordinarily  be  chosen  from  among  those 
assocmied  with  the  first;  the  third,  from  those  associated 
with  both  of  these;  etc.  The  probability  that  r*B  is 
greater  it  we  know  that  A*B  and  G.A,  than  it  wo.ild  he  if 
we  did  not  liave  these  facts,  and  we  must  take  this  into 
account . 

Let  us  siinpcsc  that  v”'  have  chosen  B from  among 
the  terms  associated  with  a,  anu  auv/  s^perbiipcoc-  the 
two  ca.rds  to  see  which  terms  are  associated  v/ith  botli. 

The  number  will  depend  on  t.he  relationship  between  A 
and  B.  B more  of  the  999  terms  of  are  also  frcaucntly 
used  with  B,  more  terms  are  likely  to  be  associated  with 
both  A and  B.  Wc  shall  as.sume  that  M/^and  Mohave  499 
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terms  in  common.  Of  these,  (0.474)  x 498,  or  112  will 
probably  be  associated  w'nn  both  A ana  B,  of  an  additional 
i,  000  terms,  0.  474  x 0.  0622,  or  29.  9,  and  of  the  i emain- 
ing  3,500,  (0.  CC22)^,  or  13.5.  We  therefore  expect  155 
common  holes  when  A and  D are  superimposed. 

The  next  step  is  to  choose  one  of  tiiese  terms,  C, 
and  superimpose  it  with  A and  B.  We  shall  as.sump  thn* 
Mpalso  contains  the  499  terms  com.ir.on  to  M^and  Mp. 


the  remaioing  500  not  common  to  either,  ihen  the  prob- 
able number  of  terms  associated  wdth  all  three  is  (0.  474)*^ 
X 499+  0.474  x (0.0622)^  x 1500+  (0.  0622)^  x 3000,  or  57. 
Similarly,  super m of  four  cards  nnrravs  the 
field  to  (0.  474)^  x 499  )t  e second  and  third  terras  become 
negligible),  or  25;  five  cards,  to  12;  and  six  cards,  to  six 
tsrms  (5,  7,  on  tlio  2VGr*3.g6,  not  counting  the  holes  Xor  the 
six  being  superimposed;  which  wi.ll  also  be  punched  on 
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each  card). 

The  terms  on  the  six  carris  form  a network  of  assoc- 
iations, every  pair  being  associated,  as  described  above. 
At  any  stage  oi  association,  the  netv.'ork  f'an  be  modified 
or  enlarged,  by-goi-^g  >jack  and  changing  the  selection  of 
terms  to  be  superimposed. 

It  will  be  recalled  that  the  indexing  machine  was,  in 
effect, a three  dimensional  body  of  information,  with  one 
dimension  (depth)  representing  the  terms  of  the  system, 
and  the  other  two  (height  and  width),  representing  the 
coordinates  of  iu-.y  document  number. 
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When  the  individual  cards  or  sheets  are  language 
elements  rather  than  terms,  the  depth  of  the  solid  (no.  of 
terms)  will  be  sm^ll  as  compared  with  the  area  of  each 
sheet.  In  fact,  the  only  restriction  on  reducing  the  number 
of  sheets  is  the  problem  of  superimposition.  W'e  cannot 
tolerate  a situation  in  which  more  thar<  half  the  holes  on 
any  sheet  are  punched,  and  we  preler  having  enough  sheets 
to  restrict  the  average  Density  of  punching  to  1/3  the  holes 
on  any  one  sheet. 

The  association  machine  is  similarly  a three  dimen- 
sional figure,  which  can  be  regarded  as  co  e of  informa- 
tion (even  though  it  may  not  be  a physical  cube. ) For  in 
the  association  machine  all  dimensions  measure  terms  in 
the  system,  and  the  depth  of  the  solid,  in  terms,  is  al- 
ways equal  to  the  number  of  occupied  dedicated  positions 


TERMS 


TERMS 
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The  question  we  must  now  answer  is  this  - should  the 
cards  cr  sheets  ir  the  association  solid  be  language  ele- 

qt*  *Gm*C^ 

With  refereuCij  to  the  indexing  machine,  our  decision 
to  use  language  elements  was  based  on  a number  of  consid- 
erations, namely: 

1.  The  low  density  of  punching  on  a Batten  card 

2.  The  SiZc  of  a sheet  necessary  for  a large 
collection  of  documents. 

3.  The  recuctiori  of  a system  from  5,000  term 
cards  to  500  language  element  cards  would 
not  increase  the  density  of  use  of  any  card  be- 
yond tolerable  densities. 

4 The  size  of  the  sheet  necessary  to  handle  a 
large  collection  of  documents  made  it  desir- 
able to  eliminate  the  necessity  for  adding 
sheets  for  new  terms. 

5.  1 he  use  of  small  term  cards  (Batten  cards) 
would  necessitate  additional  sets  of  term 
cards  whenever  the  number  of  documents  in 
the  system  exceeded  the  capacity  of  a card. 

6.  There  would  always  be  some  addition  of  i'iCTV 
terms  to  the  system,  or  setK.  and  no  way  of 
telling  in  w'lich  set  any  particular  term  will 
be  found.  Suppose  for  e.xample,  a Batten 
system  with  20,  000  holes  per  card  were  used 
to  catalog  100,000  items.  There  v/ould  be  5 
sets  of  term  cards  but  the  sets  would  not  be 
absolutely  unifurin  since  some  terms  would 
not  be  used  in  all  sets.  On  any  search,  how- 
ever, we  would  have  to  search  all  five  sets. 


V-Tien  we  turn  from  the  indexing  machine  to  the  assoc- 
iation machine  the  same  series  of  considerations  leads  to 
a decision  to  use  term  cards  instead  of  language  element 
cards. 

1.  It  appears  from  the  above  statistical  consider- 
ations that  the  density  of  posting  of  associated 
terms  on  any  term  card  will  be  high. 

2.  Since  the  size  of  the  sheet  necessary  for  a 
large  system  is  determined  by  the  number  of 
terms  in  the  system,  and  not  by  the  number  of 
uoouruents,  the  association  card  can  be  reLn- 
lively  small. 

3.  The  reduction  of  a system  from  5^,  000  term 
cards  to  500  language  element  cards  would 
increase  the  density  of  use  of  the  cards  beyond 
tolerable  densities. 

4.  Since  the  sheets  or  cards  are  relatively  smaH 
the  aadition  of  new  sheets  for  new  term.s  doe*, 
not  present  any  unusual  difficulties. 

5.  Since  the  na-mber  of  positions  dedic.ated  or: 
any  sheet  would  provide  for  all  actual  and 
potential  .srms  in  the  system,  we  would 
never  require  a nev;  set  of  term  caids  bu! 
only  addition  of  indiviauai  terir.  curas 


[meii  oemcesleciinicallefermato 

Because  of  our  limited  supply,  you  are  requested  to  return  this  copy  'ATiEf 
YOUR  PURPOSE  so  that  it  may  be  made  available  tr>  o<  her  requei'ter s.  Yi) 
will  be  appreciated. 


■X 


V' 

’ .5 ' 


NOTICE:  WHEN  GOVERNMENT  OR  OTIiER  URA’A'IffOK.  SPECIFICATIONS  OR  C/t'KER  l>A'i 
ARE  USED  FOR  ANY  PURPOSE  OTHER  THAN  IN  CCNNECTIOK  WTTil  A DSF7NI7  ELY  R.' 
GOVERNMENT  PROCUREMENT  OPERATION.  THE  U.  S.  GOVETiMrUiNT  1HERE];Y  INC  UR  j 
NO  RESPONSIBILITY,  NOR  ANY  OBLIGATION  VHATS05  VER;  AIR)  THE  F^CT  T.tAT  TMS 
GOVERNMENT  MAY  HAVE  FORMULATED,  FURNISHED  OP  Ui  ANY  A' AY  ilUPPLIED  T :;X 
SAID  DRAVvT>JGS.  SPECIFICATIONS,  OR  OTHER  DATA  tJ  NOT  TO  BE  RECARl)EI>  BY 
IMPLICATION  OR  OTKERV.TSE  AS  IN  ANY  HOLj5E,R  or  any  t 

PERSON  OR  CORPCIRATION,  OR  CONVEYING  ANY  RIGHTS  OR  FER^ilSSICN  TC  ,VlANUF  - ^\ 
USE  OR  SELL  ANY  PATENTED  INVENTION  THAT  MAY  IH  AJtY  WAY  BE  RELATED  THEa> 
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